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A concentration theorem for the equilibrium 

measure of Markov chains with nonnegative 

coarse Ricci curvature 

Laurent Veysseire 

Abstract 



In this article, we prove a concentration inequality of the order 

of the exponential of a double integral of the coarse Ricci curvature 

for the equilibrium measure of a Markov chain, in the case when this 

fy^ curvature is nonnegative. This is, to the author's knowledge, the 

H , first concentration result in a discrete setting using a non-constant 

(-i curvature instead of its infimum. 

,H Introduction 



For a Markov chain on a Polish space, a nonnegative coarse Ricci 
curvature means that the distributions after one step of the chain are 



^\ closer (in the sense of the W\ distance) than their starting points are 



[1] . Remind that the W\ (Wasserstein) metric between two probability 
measures is the infimum over the set of couplings between this two 
£-J probability measures of the expectation of the distance between the 

p\| two points. 

In the case when the space is e-geodesic (see Definitional), a non- 

^- negative coarse Ricci curvature allows to extend the local attractive- 

k> ness of a point xq to a global one (see [I] or LemmaJT]). The attractive- 

!— i ness of a point implies exponential concentration of the equilibrium 

probability measure around this point, if the Markov chain does not 
spread out too quickly. 

One of the simplest example is the random walk on N where we 
jump from n to n+1 with probability p and to (n— 1) + with probability 
1 — p. In this case, the coarse Ricci curvature is 0. If p < |, then 
is attractive and we have exponential concentration. If p > i, then 
we don't have any attractive point, neither do we have any invariant 
probability measure. 



Here we prove that the concentration of the equilibrium measure 
around an attractive point behaves at least like the exponential of a 
double integral of the coarse Ricci curvature. 

We may remark that this is the right behaviour of the invariant 
distribution for diffusion processes on the real line, as we see in the 
example below. 

Example 1 Let us consider a diffusion process on the real line whose 
generator takes the form: 

Lf= ^l_dVdl 
dx 2 dx dx 

where the energy V(x) is smooth. Then the coarse Ricci curvature is 
4-4r, and the measure e~ v ( x 'dx is reversible. We see that the density 
of the invariant measure is exacly a double integral of the coarse Ricci 
curvature. 



1 The concentration Theorems 

We define e-geodesic spaces as in [3]. 

Definition 2 Let e > 0. A metric space (X,d) is said to be e- 
geodesic if for each (x,y) £ X 2 , there exists n £ N and a sequence 
x = xq,x±, . . . ,x n = y £ X such that d(xi,Xi+x) < e for each 
< i < n- 1 and d(x,y) = Y^i=od(%i,Xi+l)- 

For a Markov chain with transition kernel P on a e-geodesic space, we 
will denote by K e {x) the local coarse Ricci curvature at x: 

K £ (x) := inf n(x,y) 

yeX\0<d(x,y)<e 

Where n(x,y) := 1 \, x \ y is the coarse Ricci curvature between 

x and y as defined in [3J. 

Here we will prove the following concentration result for the equi- 
librium measure of Markov Chains: 

Theorem 3 Let X be an e-geodesic metric space and P be the tran- 
sition kernel of a Markov chain on X . Assume that: 

• there exists p > and a point xq such that xq is p-attractive for 
the Markov chain in the sense that 

Vx\e < d(x,x ) < 2e,Wi(5 X0 ,P x ) < d(x ,x) - p, 



there exists a non-increasing function K : 
K £ (x)>K(d(x,x )), 



M + satisfying: 



there exists s > such that for any x G X , any 1-lipschitz 
function f : X i— > R and any A£l, we have: 

r i , t \ 2 2 



Af 



T/ien w;e /iaue, for every I > 2e + and any equilibrium measure 

F x ~n(d(x,x ) > I) < C e-^ {1) 



n: 



with 



and 



$(l):=pl+ [ (f U K[ 

Jle \J2e 



v)dv ) du 



a, 



9 o Kg)£: \ 

^max(3 £ ,p+^)£l)_^ + _^( p{2£+ M|)^) +/2 ; + P /«. *T(,)d«d« 



1 — e S 2 



Remark 4 If K = 0, we obtain exponential concentration, as proved 
in g]/. 

Proposition 5 If closed balls are compact, then under the hypotheses 
of Theorem^ there exists an equilibrium measure. 

Remark 6 In the case when L K(r)dr = oo, and for some (hence 
any) xo G X, W\(5 Xo , P Xo ) < oo, then for any p > 0, there exists 
a e > large enough such that xq is p-attractive. This is a trivial 
consequence of the Lemma below. 

Lemma 7 Let X be an e- geodesic metric space and P be the transi- 
tion kernel of a Markov chain such that there exists a non-increasing 
function K : M + i— >• M + and a point xq £ X satisfying: 



Then we have 



where 



K £ (x)>K(d(x,x )). 

^y~p(x)[d(xo,y)] < d(x ,x) - F(d(x ,x)), 
' P + iL K (u)du if 2s < I 



F(l) :- 



P 



-J(0) 



ife<l<2e 
ifl<e 



withp := m{ x \ £<d ( x>xo) < 2£ d(x, x )-Wi(P(x), 5 Xo ) andJ(x ) = Wi(P(x ),5. 



XQ , 



Proof : 

If e < d(x, xo) < 2e, this is just the definition of p. If d(x, xq) < e, 
wehaveK £ (x ) > 0, so W^P^x), 5 xo ) < W 1 (P(x),P(x ))+W 1 {P(x ),S X0 ) < 
d(x,xo) + J(xo). If d(x,Xo) > 2e, there exists x\,...,x n = x such 
that d(xi,Xi + i) < e, e < d(xi,xo) < 2e and d(x,xo) = d(xi,xo) + 
Y17=i d(xi,Xi + i). We have then 

n— 1 

W!(P(x),4 ) < ^i(P(xi)A„) + ^^ 1 (P(x l ),P(x m )) 

1=1 

< d(xi,x ) - p + ^(1 - K(d(xi, x )))d(xi, x i+ i) 

i=l 

«-! /.d(x i+ i,a;o) 

<d(x,x )-p-^ / K(l)dl 

i=1 Jd(xi,x ) 
rd(x,x ) 

<d(x,x )- p- / K(l)dl 

Jd(xi,xo) 

< d(x,xo) — F(d(x,xo)).D 

Lemma 8 Let p be a probability measure on X and s > be such 
that for any 1-lipschitz function f, we have the following inequality: 



E„ 



Af 



<e 



AE M [/]- 



Then, for each C l function g : R i— )■ R smc/i i/iai </ is Lipschitz and 
Wd'Wiip < -? and for each 1-lipschitz function f, we have: 



9(PMh 



E 



/' 



3 3°/ 



< 



2 (i-^lls'llii P ) 



/l c 2|| n /||,. 
V l ~ s \\9 \\hp 



Proof : 

For each x £ X, we have 

e <?°/(*) < e 9(E„[/])+(/(x)-E„[/]) ff 'CE M [/]). 



(/W-E M [/]) 2 



Iff'lliip 



Now we use the fact that the Laplace transform of a Gaussian measure 

Af(M,a 2 ) is: 



00 A ^_(^ du 



e e 2<t^ 



\2„2 

e AM+^f- 



V / 2ttcj 2 
So, taking A = /(x) - E„[/], M = g' (E^[/]) and <r 2 = ||</|| % , we get: 

du 



,<?(/(*)) < e <?CM/]) 



e «(/(x)-E M [/]) e " 



(«-g'(E M m)) 2 

ng'Wup 



\/M9 T \ 



lip 



Integrating this inequality with respect to \i and using our assumption 
yields: 



E, 



3 3°/ 



<e 



OO 22 (--g't'fl/])) 2 

<?0M/D / e ^ e n7h~~ 



du 



.sOM/D 



v^ywup 



s\ 


"(K M [/]) 


e 2(1 - 


-s 2 lls'llli p 


v/l- 


s2 ||9'll*ip 



as needed. □ 



Theorem 9 Let X be a e-geodesic metric space and P be the tran- 
sition kernel of a Markov chain. Assume that there exists a non- 
increasing function K : R+ i— >• M+ and a point xq £ X satisfying: 

K £ (x)>K(d(x,x )) 

and that there exists s > such that for any x G X, any 1-lipschitz 
function /:Ih>R and any A £ M ; we /icwe: 



E, 



Afl <e ^pjf}+ ^ir- 



Let F be defined as in LemmaVA Then, for every pair (a, do) G 
satisfying: 



d > 2e 
F(do) > 



s 2 K(d ) 



a < 



mdo) 



-aF(d ) 2 (l 9^ ) 

• Cn rln '- = / : < 1 

y/1 - as 2 K(d ) 
we have the following concentration inequality for any equilibrium 
measure n of the Markov chain and any I > do : 

^a,do 



F x ^(d(x,x )>l)<C' o 



a. a 



-a((p(l)-tp(do)) 



i -a 



a,d 



J(x )+e 



where <p(l) = f l F(u)du, andC' ado := e al 'W><o-n*o) / do !W up(F(do) ' F( " ))ch \ 

Proof : we set ip(x) = aip(x) if x > do and ^(x) = a(ip(do) — 
(do — x)F(do)) if x < do- Under our assumptions, V is convex and 
increasing, and we have H^'Hup = aK(do) < \- Our goal is to bound 
the quantity E^ [e^^'^l^,,)^] . We'have: 



E^ 



° *-d(x,xo)>do 



= E^ 


E»~ft 


e 4>(d(y,x )) j 


d(y,xo)>do 


< F 


E^p, 


e i>(d(y,x )) 





Using Lemma [8] with fj, = P x and g = ip, and Lemma [7j we get: 



E*~* 



^(d(x,x ))-^ 



d(x,xo)>do 



< 



E, 



^(^o)-fw i ,xo)))+ 2 ;;jw )) 

71 - as 2 K(d ) 



'-d(x,xo)<do 



a 2 s 2 F(d(x,x )) 2 



+ ~ 



3 ^(^o)-m^)))+ v 1 -aX , ^» 



Vl - as 2 K(d 



'-d(x,xo)>do 



The function Z 1— >• Z — F(Z) is nondecreasing on [0,e] and on (e,cZo)> 
and if) is an increasing function. Then, for d(x, xo) < do, we have 
il>(d(x,x ) - F{d )) < ^(max(J(x ) + e, d - F(d )) = ln{C' ado ) + 

a(ip(d )-F 2 (d )). 

Ford(x,xo) > do,wehaveiJj(d(x,xo)—F(d(x,xo))) < ij:(d(x,xo)) — 
aF 2 (do). So we get: 



E 



jl){d(x,xo))-^ 



d(x,xo)>do 



< Cl' , C j e a v( d o)+n j E \p<l>(d{x,x ))-i 



'-d(x,xo)>do 



And then, since C Q d < 1, we finally obtain: 

E,. 



V(d(at,a:o))i . 



1 ~~ ^a,do 



Now we just have to use the Markov inequality to derive the desired 
inequality. □ 

Remark 10 In the previous proof, we didn't fully use the hypoth- 



esis F(do) > 



s 2 K(d ) 



. In fact, for a fixed do, \n.{C a ^ ) is a con- 



vex function of a on the interval [0, % K r d \ )- We have Co.d '- : 1 

and -§^\-n(C a4o )\ a= o < if and only if F(d ) > s K ^ d °' . So if 

< F(do) < - — 2> there doesn't exist any a such that C a ^ < 1 
and so the theorem wouldn't tell us anything at all. 

Remark 11 If K{do) < \ have C_2_ d > 1, so we must have a < -%. 



— > and Fix) 



-¥ +00, we can 



Under the hypothesis that k(x) - 

x — ?CXJ x — ?(XJ 

find for any < a < \ a do such that C a ^d < 1- Of course we need 
a greater do when a gets closer to \ . 

One way to choose a and do is given by the following proof of 
Theorem [3j 



Proof of Theorem 3t we use Theorem 9 with a = -^-% an d do = 

2e + . We only have to check that in this case, C a ^ < e"*^" 

3e lr> i ln(2)s 2 n 

j rM ^ -^ max 3£.pH — — ) 

andC; jdo <e^ 
We have 

in(c -*'> = (-° + 2(i-°!'l- W) ) f2 <*> - s in(i - ""^*»- 

Since K(d ) < 1, we have -a + 2 (i-as*K(d )) - ~I7^ We nave 
^(do) >p+^^ir(do), and then F{d ) 2 > p 2 + 2\n(2)s 2 K(d ). Us- 
ing the concavity of In on [^, 1], we get ln(l— as 2 K (do)) > — ln(2)K(do). 
Thus we get: 

HC a4o ) < -^(p 2 + 2ln(2)s 2 K(do)) + ^K(d ) = -£;. 
For C' ado , we have 

j rJ(x )+e 

HC' a ,d "> = 7T^ 1 J(xo)+e>do-F(do) / max(F(d ), F(«))d« 

2S Jd -F{d ) 

< A(( J (^o) + e) - (do - ^0)))+ max(F(d ), F(J(x ) + e)). 

By the triangular inequality for W\, we have J(xo) < Wi(5 Xo ,P(x)) + 
Wi(P(x),P(xo)) < Wi(5 XQ ,P(x)) +d(xo,x), for any x because the 
coarse Ricci curvature is nonnegative. If we take x such that e < 
d(xo,x) < 2e, we have J(xq) < 2d(x^,x) — p < 4e — p. We have 

F(do) < P + ^fA so d - F(do) >2e-p and then ((J(x ) + e) - 
(do - F{d )))+ < 3e. And finally, F(J(x ) + e) < F(5e - p) < 3e. 
Putting that together give us the desired bound for C' a d .D 
Proof of Proposition [5j We take a and do as i n the proof of Theo- 
rem [3| We consider the sequence of probability measures P£ Q . Then, 
doing as in the proof of Theorem [9j we have : 

®*~itfi[e mx * o))l d(*,o)>*>} < CUCajo+Ca^M^pn [e^C^o))! ]. 

From that, we can conclude that there exists C < +00 such that for 
all n, we have E P n [ e ^( d ( x ' x o))j < Q. So the sequence P" is tight, 

and then, so is the sequence 7r n = ^xYu7=o^%f Because closed 
balls are compact, we can extract a weakly convergent subsequence 
7T0( n ), and we denote by ir its limit. The W\ distance metrizes the 
weak convergence on the set of probability measures on X satisfying 
^ e 4>Wx,x ))j < q ^ gee jgj^ Thus the subsequence ng^ converges to ir 



for the W\ distance. Furthermore, we have Wi(x n , Pn n ) < ^rj with 
C < oo a constant. We have then 

WifaP*) < VFi(7r,7r 0(ri) ) + Wi(7r 0(n) ,P7r 0( „ ) ) + Wi(P7r e(n) ,P7r). 

The nonnegative coarse Ricci curvature implies that P contracts the 
W\ distance ([!]), so the third term of the right hand side is at most 
the first one. We have already seen that the first two terms tend to 
when n tends to +00. So the right hand side tends to when n tends 
to +00. Thus Wi(ir, Pit) = 0, and then ix is an invariant measure. □ 



2 Some examples 

Let us see which concentration we can get with Theorem [3] and The- 
orem [9] in some examples below. 

Example 12 (Discrete time M/M/k queue (see, for example |2j)) 

Let < no < k be two integers. We consider here the Markov chain 
on integers with transition kernel: 



p(n,n + 1) 

p(n,n) 

p(n,n — 1) 



no 



riQ + k 
{k-n) + 
n + k 
min(n, k) 



riQ + k 
p(n, m) = if \n — m\ > 1. 



The origin xo we will consider to apply Theorem [3] is no, the only 
point at which the probability to jump at left equals the probability 
to jump at right (that is why we chose no integer). Hoeffding's Lemma 
(see [3]) states that for a random variable X such that a < X < b 

r \ 1\1 A 2 (b-q) 2 

almost surely, we have E le (X — K[X])\ < e 8 . So we can take 
s = 1 in theorem[3| To compute the coarse Ricci curvature, we remark 
that if x < y, the measure P y dominates stochastically the measure 
P x , and thus the W\ distance between them is the difference of their 
expectations. For x < y, the coarse Ricci curvature K(x, y) is then 
^+k if V ^ fc ' f=f ^Tfc if 2; < /c < y and if x > A;. If we take e = 1, 
we have p = — Vr> and Kir) = r<k ~"° , 

Applying Theorem [3] should give a Gaussian then exponential con- 
centration, but, as p is very small, oIq is large (2 + (no + k) hi(2)). If 
k — no< "_w ( 2) , we get only the exponential part. If k is too large, 



do is large too, and the gaussian-then-exponential bounds starts far 
away from uq. We can try to take a larger e to get a better p. In- 
deed, we get p = m „ ' , . , but we pay that by a worse curvature 



no+fc 



K(r) = - l +k min(l,max(0, ~"°~ r )). We distinguish 3 cases depend- 
ing on how k — no is tall with respect to no- 

When /c — no is between y / no and no, the equilibrium measure is 
well approximated by a Gaussian between and k. 




The optimal e is O(y / no), the coefficient of the Gaussian part of the 
concentration inequality is 0( — ), which is good, and the coefficient 

of the exponential part is 0( ~"° ), like the right one. 

When k — no is o(y / no), the mass of [0, A;] under the equilibrium 



measure is negligible with respect to the one of [k, oo) 




The optimal e and do are 0(k — no), this time, we have no Gaussian 
part because do is too large (and indeed, there is no Gaussian part in 
the equilibrium measure), and the coefficient of the exponential part 
is about one half of the right one. 

When k — rio is greater than no, the equilibrium measure is almost 
the Poissonian one with parameter no, the density of the equilibrium 
measure is illustrated below: 




->► 



The optimal e and do are 0(yk), the coefficient appearing in the 
Gaussian part is 0{\), instead of an expected — , and the coefficient of 
the exponential part is 0(1), which is clearly not optimal, so Theorem 
[3] gives a rather bad concentration inequality. 

Example 13 (Discrete time Ornstein Uhlenbeck) Let < a < 

1 be a real parameter. Here we consider the Markov Chain on M. given 
by the transition kernel: 

P x =Af{(l-a)x,l)- 

It is shown in pQ that in the Gaussian case, we can take the 
variance of the distribution for s 2 . So we take s 2 = 1, and for ev- 
ery e > 0, the curvature is constant K = a. We have p = £ — 



+ /c 



(l-a)e -4, 



e 2 dx I > — V27T + oi£. Theorem 



ap- 



V 21n (2)c 



a(2-a) 



gives us Gaussian concentration with 
(so we have a loss of a factor between 



plied with e 

coefficient j instead of , 

2 and 4), and d = 0(v/|) 

The bad behaviour of s 2 prevents to easily generalize Theorem [3] or 
Theorem [9] to continuous time. The following example of a continuous 
time processes, whose generator merges a diffusive part and a jump 
part, shows that a generalization of those theorems does not hold, 
even if the jump rate is uniformly bounded. 

Example 14 Consider a continuous time process on IR + with a linear 
drift towards and a random jump to the right of size 1 and rate 1 . 
The generator of this process is given by Lf(x) = —axf'(x) + f(x + 
1) — f(x), with a > a constant which quantifies the drift. 

In this example, the coarse Ricci curvature is a. Indeed, using 
the coupling of the processes Xt and Yj starting at x and y such that 
Xt and Yt jump at the same times shows that the law of Yt is the 
translation of the law of Xt by (y — x)e~ at . If something like Theorem 
[3] or Theorem [9] did hold, we would have Gaussian concentration. But 



10 



actually there is only Poissonian concentration. Let us prove there is 
Poissonian concentration and no better. We denote by Xt the value 
of the process at the time T. Let T±, T2, ■ ■ ■ be the successive times of 
the jumps. For all T > 0, let N(T) be the number of jumps between 
and T. We have 

N(T) 

X T = e- aT X +^2e-^ T - T ^. 
i=i 

If we take Xq = then E[Xr] < T, and since the coarse Ricci curva- 
ture is greater than a > 0, there exists an unique invariant probability 
measure tt (see [5]). 

Now take Xq with the law n. Then X\ has the law w, and is 
greater than e -Q iV(l), which has a Poissonian concentration since 
iV(l) follows precisely a Poisson law of parameter 1. So we cannot 
have a better concentration than a Poissonian one. 

It remains to prove that tt has Poissonian concentration. We take 
Xq = 0. Let us consider the Laplace/Fourier transform of Xt, that 
is Gt(\) ■= E[e XXT ] for A e C. N(T) has the law V(T), and the 
repartition of the Tj's knowing N(T) is the one of N(T) independent 
random variables uniformly distributed in [0,T]. So we have: 




with 1(A) = Y%=i ^t = I X (eZ ~ 1)dz - We see that G T (X) tends to a 
limit G(X), which is the Laplace/Fourier transform of tt, when T tends 

to +00. 

/(A) 

We have G(X) = e <* . An integration by parts gives us 1(A) = 

^X^ + lo 9 ^ ldz ^ so J ( A ) ~ X" For l > !> we use the Markov 
inequality on e ' ' and get: 

-f(ln(0) M /n 

P^[X > I] < e^P-' 111 ^, 

and we have - — - ~ , ,,. = o(Z ln(Z)). So we have Poissonian con- 
centration. 
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